Efficient Path Kernels for Reaction Function Prediction

نویسندگان

  • Markus Heinonen
  • Niko Välimäki
  • Veli Mäkinen
  • Juho Rousu
چکیده

Kernels for structured data are rapidly becoming an essential part of the machine learning toolbox. Graph kernels provide similarity measures for complex relational objects, such as molecules and enzymes. Graph kernels based on walks are popular due their fast computation but their predictive performance is often not satisfactory, while kernels based on subgraphs suffer from high computational cost and are limited to small substructures. Kernels based on paths offer a promising middle ground between these two extremes. However, the computation of path kernels has so far been assumed computationally too challenging. In this paper we introduce an effective method for computing path based kernels; we employ a Burrows-Wheeler transform based compressed path index for fast and space-efficient enumeration of paths. Unlike many kernel algorithms the index representation retains fast access to individual features. In our experiments with chemical reaction graphs, path based kernels surpass state-of-the-art graph kernels in prediction accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search

In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...

متن کامل

Reaction Kernels - Structured Output Prediction Approaches for Novel Enzyme Function

Abstract: Enzyme function prediction problem is usually solved using annotation transfer methods. These methods are suitable in cases where the function of the new protein is previously characterized and included in the taxonomy such as EC hierarchy. However, given a new function that is not previously described, these approaches arguably do not offer adequate support for the human expert. In t...

متن کامل

Congestion estimation of router input ports in Network-on-Chip for efficient virtual allocation

Effective and congestion-aware routing is vital to the performance of network-on-chip. The efficient routing algorithm undoubtedly relies on the considered selection strategy. If the routing function returns a number of more than one permissible output ports, a selection function is exploited to choose the best output port to reduce packets latency. In this paper, we introduce a new selection s...

متن کامل

Path Kernels and Multiplicative Updates

Kernels are typically applied to linear algorithms whose weight vector is a linear combination of the feature vectors of the examples. On-line versions of these algorithms are sometimes called “additive updates” because they add a multiple of the last feature vector to the current weight vector. In this paper we have found a way to use special convolution kernels to efficiently implement “multi...

متن کامل

Reaction kernels: predicting enzyme functions you have never seen before

Motivation: Enzyme function prediction is an important problem in post-genomic bioinformatics. There are two general methods for solving the problem: annotation transfer from a similar annotated protein, and machine learning approaches that treat the problem as classification against a fixed taxonomy, such as Gene Ontology or the EC hierarchy. These methods are suitable in cases where the funct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012